Metaphor Identification in Large Texts Corpora
نویسندگان
چکیده
Identifying metaphorical language-use (e.g., sweet child) is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms' performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus.
منابع مشابه
A Study of Ideational Grammatical Metaphor in Health Texts of English Newspapers
Systemic functional grammar constructs a grammar for the purpose of text analysis to investigate how grammar is used as a means of making meaning. Grammatical metaphor is one of the language phenomena introduced by Halliday (2004) in the framework of functional grammar. The present work focuses on the application of Halliday’s metafunctional framework in health texts of English newspapers. The ...
متن کاملA Comparative Study of Ideational Grammatical Metaphor in Scientific and Political Texts
Language, science and politics go together and learning these genres is to learn a language created for codifying, extending and transmitting scientific and political knowledge. Grammatical metaphor is divided into two broad areas: ideational and interpersonal.This paper focuses on the first type i.e. Ideational Grammatical Metaphor (IGM), which includes process types and nominalization. The m...
متن کاملMeasuring Interlanguage: Native Language Identification with L1-influence Metrics
The task of native language (L1) identification suffers from a relative paucity of useful training corpora, and standard within-corpus evaluation is often problematic due to topic bias. In this paper, we introduce a method for L1 identification in second language (L2) texts that relies only on much more plentiful L1 data, rather than the L2 texts that are traditionally used for training. In par...
متن کاملSyntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity
In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...
متن کاملA lexicon of perception for the identification of synaesthetic metaphors in corpora
Synaesthesia is a type of metaphor associating linguistic expressions that refer to two different sensory modalities. Previous studies, based on the analysis of poetic texts, have shown that synaesthetic transfers tend to go from the lower toward the higher senses (e.g., sweet music vs. musical sweetness). In non-literary language synaesthesia is rare, and finding a sufficient number of example...
متن کامل